11. Analyze Data

Analyze Data

Let's assume that the experiment was given the green light to go ahead, and
data was collected for 29 days. As a reminder of the discussion on experiment
sizing, it was found that a three-week period was needed to collect enough
visitors to achieve our desired power level. Eight additional days of
collection were added to allow visitors in the last week to complete their
trials and come back to make a purchase – if you look at the data linked in the
next paragraph, you will see that it takes about eight days before the license
purchases reaches its steady level.

The collected data can be found here.
The data file reports the daily counts for the number of unique cookies, number
of downloads, and number of license purchases attributed to each group:
the experimental group with the new homepage, or the control group with the old homepage. The number of license purchases only includes purchases by users who joined after the
start of the experiment, so there will be some time before the counts reach
their steady state. As noted earlier, we'll assume that the potentially
muddying effects of visits across multiple days, established user visits, and
'lost' cookie tracking will be ignorable, at least unless we find reason to
doubt our findings.

Invariant Metric

First, we should check our invariant metric, the number of cookies assigned to
each group. If there is a statistically significant difference detected, then
we shouldn't move on to the evaluation metrics right away. We'd need to first
dig deeper to see if there was an issue with the group-assignment procedure, or
if there is something about the manipulation that affected the number of
cookies observed, before we feel secure about analyzing and interpreting the
evaluation metrics.

Evaluation Metrics

Assuming that the invariant metric passed inspection, we can move on to the
evaluation metrics: download rate and license purchasing rate. For a refresher,
the download rate is the total number of downloads divided by the number of
cookies, and the license purchasing rate the number of licenses divided by the
number of cookies.

One tricky point to consider is that there is a seven or eight day delay
between when most people download the software and when they make a purchase.
There's no direct way of attributing cookies all the way through license
purchases due to the daily aggregation of results, so the best we can do is to
make a justified argument for handling the data. To answer the question below
about the license purchasing rate, you should only take the cookies observed
through day 21 as the denominator of the ratio as being responsible for all of
the license purchases observed. (A more informed model of license purchasing
could come up with a different handling of the data, such as including part of
the day 22 cookies in the denominator.) (Note that we don't need to perform
this kind of correction for the download rate, since the link between homepage
visits and downloads is much closer.)